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SUMMARY 


Air  Force  management  assures  a  high  quality  workforce  by  maintaining  appropriate  entry-level 
aptitude  standards.  In  the  early  1980s,  the  Air  Force  Human  Resources  Laboratory  (AFHRL) 
developed  a  method  to  determine  minimum  aptitude  test  scores,  based  on  Occupational  Learning 
Difficulty  (CLO),  to  be  used  as  a  screening  criterion  to  qualify  incoming  airmen.  The  method 
involved  procedures  referred  to  as  "difficulty  benchmarking."  It  required  a  team  of  occupational 
analysts  to  become  familiar  with  "benchmark"  rating  scales  (25-point  scales  with  tasks  of  varying 
learning  difficulties  from  different  specialties  within  a  given  aptitude  area).  Team  members 
would  observe  tasks  from  any  given  specialty  and  rate  them  for  learning  difficulty  against  the 
benchmark  scale.  This  enabled  relative  measurement  of  the  learning  difficulty  of  specialties 
within  a  given  aptitude  area.  The  method  proved  too  expens-'v«  i*  terms  of  funds  and  man-hours  to 
be  practical.  This  paper  describes  the  -esearcn  effort  to  develop  affordable  and  creditable 
procedures  for  determining  OLD  by  investigating  the  use  of  judgmental  task  learning  difficulty 
-ari.nns  oy  sub  iec  t-rn*  t  tpr  exn?-rr 


PERFACE 


This  work  was  completed  under  Work  unit  77191911,  Measurement  and  Analysis  of  Job 
and  Mission  Requirements.  The  paper  was  presented  at  the  30th  Annual  Conference  of  the 
Military  Testing  Association  and  published  in  the  proceedings  of  that  event. 
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AFFORDABLE  AND  CREDITABLE  PROCEDURES  FOR 
DETERMINING  OCCUPATIONAL  LEARNING  DIFFICULTY 


I.  INTRODUCTION 

New  technologies  are  needed  to  estimate  Manpower,  Personnel,  and  Training  (MPT)  requirements 
and  tradeoffs  during  the  planning  and  conceptual  development  stages  of  new  and  modified  weapon 
systems.  To  this  end,  an  extremely  useful  decision-making  tool  is  the  measure  of  Occupational 
Learning  Difficulty  (OLD)  for  each  Air  Force  specialty  (AFS).  The  measure  of  occupational 
learning  difficulty  is  used  for  setting  appropriate  aptitude  standards,  as  stated  in  Air  Force 
Regulation  (AFR)  39-1,  Airman  Classification,  for  both  entry-level  and  cross-special ty  transfer 
requi  rements.  It  is  also  used  in  the  Air  Force  person-job-match  algorithms  for  determining 
individual  assignments  to  specialties,  as  described  by  Weeks  (1984). 

The  work  behind  the  (OLD)  measurement  began  back  in  1973,  when  the  Air  Force  Military 
Personnel  Center  (AFMPC)  requested  that  the  Air  Force  Human  Resources  Laboratory  (AFHRL)  conduct 
research  to  develop  an  objective  procedure  to  aid  in  establishing  relative  aptitude  requirements 
for  enlisted  occupations.  After  extensive  research  and  development  (R4D),  AFHRL  developed  a 
technology  that  produced  measures  of  OLD.  Occupational  learning  difficulty  is  defined  as  "the 
time  it  takes  to  learn  to  perform  an  occupation  satisfactorily"  (Mead  4  Christa!,  1970). 

In  deriving  measures  of  OLD,  three  types  of  occupational  Information  were  employed:  (a)  task 
time-spent  ratings  provided  Dy  Incumbents,  (b)  supervisory  ratings  of  task  difficulty,  and  (c) 
benchmark  ratings  of  task  learning  difficulty  obtained  through  evaluations  by  contractor 
personnel.  The  first  two  measures  are  available  from  the  USAF  Occupational  Measurement  Center 
(USAFOMC).  Benchmark  ratings  were  necessary  because  supervisory  ratings  of  task  difficulty 
provided  only  Information  concerning  the  relative  Order  of  tasks  within  occupations. 
Consequently,  supervisory  ratings  were  not  comparable  across  occupations.  However,  benchmark 
ratings  of  task  learning  difficulty  which  are  based  on  task-anchored  benchmark  rating  scales 
(Burtch,  Lipscomb,  4  Wissman,  1982)  are  comparable  across  occupations  within  a  given  aptitude 
area.  Benchmark  ratings  were  collected  by  contractor  personnel  for  this  purpose.  Occupational 
learning  difficulty  measures  were  derived  for  more  than  200  enlisted  AFSs. 

Following  the  initial  data  collection  by  contractor  personnel,  research  was  undertaken 
(Garcia,  Ruck,  4  Weeks,  1985)  to  enable  the  transfer  of  this  technology  to  an  operational 
setting.  The  procedure  thus  developed  used  Air  Force  personnel  from  USAFOMC  to  routinely  collect 
benchmark  ratings.  This  was  to  provide  up-to-date  learning  difficulty  estimates  for  any  AFS. 
Teams  of  USAFOMC  staff  members  would  conduct  interviews/task  observations  and  then  rate  the  tasks 
on  the  25-point  benchmark  scales.  This  procedure  proved  Imptactlcal  to  support,  due  to  travel 
and  man-hour  cost  requirements.  Consequently,  it  was  never  fully  implemented. 

To  properly  transfer  the  learning  difficulty  measurement  technology  from  a  research  to  an 
operational  setting.  It  Is  necessary  to  develop  a  practical  (quick  and  Inexpensive)  procedure  for 
collecting  reliable  task  difficulty  data  on  the  25-point  benchmark  scale.  One  potential  solution 
is  to  develop  a  procedure  Involving  mail  surveys  to  collect  judgmental  ratings  from 
Subject-Matter  Experts  (SMEs).  This  paper  describes  the  approach  taken  and  results  obtained  in 
the  quest  for  a  solution  using  mail  surveys. 

The  research  was  conducted  In  two  phases.  Phase  I  Involved  only  one  AFS  and  was  primarily 
aimed  at  developing  and  testing  the  survey  instrument  and  procedures  while  providing  Initial  data 
for  analysis.  Phase  II  Involved  the  collection  and  analysis  of  data  for  eight  AFSs  using  the 
modified  survey  Instrument  and  procedures  based  on  experience  gained  In  the  pilot  study. 
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II.  PHASE  I 


Method 


Three  criteria  were  used  in  selecting  the  single  AFS  for  analysis  in  Phase  I.  First, 
original  benchmark  learning  difficulty  ratings  collected  by  the  contractor  personnel  had  to  be 
available  for  the  AFS.  Second,  there  should  have  been  minimal  or  no  change  to  the  structure  of 
the  AFS  since  the  original  study.  Finally,  there  should  have  been  no  significant  change  to  the 
nature  of  tasks  completed  by  the  specialty.  This  would  ensure  that  current  incumbents  understood 
the  tasks  that  were  being  rated.  The  specialty  selected  was  Instrumentation  Mechanic,  Air  Force 
Specialty  Code  (AFSC)  316X3.  This  AFSC  is  from  the  electronics  aptitude  area. 

The  mailable  survey  instrument  consisted  of  five  items:  (a)  a  motivational  cover  letter  with 
detailed  completion  instructions;  (b)  a  background  information  sheet  for  the  collection  of 
demographic  data  about  each  rater;  (c)  the  electronics  benchmark  rating  scale  booklet,  which 
contained  the  25-point  rating  scale  and  explanations  of  each  task  as  developed  by  Aurtch  et  al . ; 
(d)  the  list  of  60  tasks  to  be  rated  on  a  rating  form  (these  60  tasks  were  the  same  60  originally 
used  by  the  contractor);  and  (e)  a  follow-up  information  form  designed  to  elicit  useful 
information  about  the  understandabi  1  i ty  of  the  survey  completed. 

Five  groups  of  raters  were  selected  to  complete  the  survey:  (a)  40  SMEs,  7-  and  9-skill  level 
enlisted  members  randomly  selected  from  AFSC  316X3;  (b)  10  AFHRL  behavioral  scientists,  all  with 
research  experience;  (c)  7  occupational  analysts  from  USAFOMC,  all  with  experience  in  developing 
and/or  analyzing  occupational  surveys;  (d)  7  training  developers  from  USAFOMC,  all  with 

experience  in  task  analysis  and  in  the  USAF  training  system;  and  (e)  6  novices,  predominantly 
young  adults  with  little  or  no  Air  Force  experience.  Groups,  except  the  SME  group,  consisted  of 
civilian,  military  officer,  and  military  enlisted  members.  Surveys  were  sent  to  the  above  groups. 

Analyses  of  the  data  collected  in  the  pilot  study  had  three  major  goals:  (a)  to  provide 
suggestions  for  imorovement/modificatlon  to  the  survey  instrument,  (b)  to  determine  the  validity 
of  the  SMEs'  responses,  and  (c)  to  compare  measures  of  OLD  generated  from  SME  benchmark  data  with 
those  generated  from  contractor  data. 


Results 

The  response  rate  for  the  316X3  SMEs  was  23  responses  to  40  mailed  questionnaires. 
Nonetheless,  the.c  were  sufficient  responses  to  provide  a  reasonable  basis  to  meet  the  goals  of 
Phase  I.  Response  rates  for  the  other  non-SME  groups  were  very  good,  with  almost  everyone 
responding.  Responses  to  the  follow-up  information  form  suggested  that  no  significant  changes  to 
the  survey  package  were  required.  Responses  for  each  of  the  respondent  groups  were  averaged  to 
form  a  mean  group  rating.  Intercorrelations  between  SME  mean  group  ratings,  contractor  ratings, 
and  original  supervisor  task  difficulty  ratings  from  the  USAFOMC  Occupational  Analysis  study  were 
calculated  using  the  ASCII  Comprehensive  Occupational  Data  Analysis  Programs  (CODAP)  CURVES 
program. 5 


' The  original  CODAP  system  developed  by  Christa!  (1974)  has  been  expanded  and  rewritten  in 
ASCII.  The  present  system,  Including  CURVES,  is  described  in  detail  in  users'  manuals  stored  on 
computer  files  at  AFHRL,  Brooks  AFB,  TX. 
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As  can  be  seen  in  Table  1,  the  SHE  ratings  correlated  well  with  contractor  ratings  (r  =  .75) 
and  even  better  with  supervisor  ratings  (r  =  .86).  The  contractor  ratings  correlated  at  the  same 
level  as  the  SME  ratings  with  the  supervisor  ratings  tr  =  .75). 


Table  1.  Correlations  (r)  and  Average  Group  Ratings 


Group 

r  with 

contractor 

ratings 

r  with 
supervi sor 
ratings 

Avg  rating 
(across  60 
tasks) 

Avg  abs  dif 
with  contractor 
ratings 

Contractor 

1 .00 

.75 

13.13 

C.O 

316X3  SME s 

.75 

.86 

14.03 

1 .  78 

AF-RL  Scientists 

.87 

.83 

12.77 

1.82 

SMC  Occup  Analysts 

.79 

.  84 

12.03 

2.08 

SMC  Tng  Developers 

.74 

.  77 

12.40 

1.9C 

Novices 

.55 

.45 

11.68 

2.63 

It  was  important  that  the  SME  ratings  fall  close  to  the  criterion  level  on  the  25-point 
scale.  The  average  SME  rating  across  all  60  tasks  was  14.03,  slightly  above  the  contractor 

average  rating  of  13.13  (see  Table  1).  An  OLD  value  was  generate^  for  both  SME  and  contractor 

rating  data.  These  were  determined  using  ASCII  CODAP  and  were  the  Average  Task  Difficulty  Per 
Unit  Time  Spent  (ATDPUTS)  of  first-term  airmen  multiplied  by  10.  This  procedure  uses  task 

difficulties  established  on  the  25-point  benchmark  scale  for  all  tasks  in  the  APS.  These 
benchmarked  values  are  extrapolated  from  the  best-fit  linear  relationship  between  the  original 
supervisor  task  difficulty  ratings  and  the  60  benchmarked  tasks.  The  occupational  learning 
difficulty  values  were  129  using  SME  data  and  122  using  contractor  data. 

The  four  non-SME  group  mean  ratings  correlated  well  with  the  contractor  and  supervisor 
ratings  (see  Table  1).  The  average  rating  across  all  50  tasks  was,  in  each  case,  below  the 
contractor  rating.  The  average  absolute  difference  between  these  four  group  mean  ratings  and 
contractor  ratings  ranged  from  1.82  to  2.63,  all  worse  than  the  SMEs  at  1.78. 

The  non-SME  groups  rated  suprisingly  accurately.  This  is  explained  by  the 

"understandabi 1 ity"  of  most  of  the  60  tasks  in  relation  to  the  benchmark  tasks.  Overall,  the 
mail  survey  procedure  showed  merit,  as  SMEs  were  able  to  produce  accurate,  although  slightly 
inflated,  ratings. 


III.  PHASE  II 


Method 


Eight  AFSs  were  selected  for  Phase  II.  The  same  selection  criteria  were  used  as  in  Phase  I. 
Two  AFSs  were  selected  from  each  of  the  four  aptitude  areas  of  general,  administrative, 
electronics,  and  mechanical.  Of  these  two  in  each  aptitude  area,  one  had  a  high  aptitude 
requirement  and  one  had  a  low  aptitude  requi  rement  as  detailed  in  AFR  39-1.  Those  selected  were 
251X0,  Weather;  272X0,  Air  Traffic  Control;  304X0,  Wideband  Communications  Equipment;  427X0, 
Machinist;  542X1,  Electric  Power  Line;  603X0,  Vehicle  Operator/Dispatcher;  702X0,  Administration; 
and  732X0,  Personnel. 
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The  mailable  survey  instrument  for  Phase  II  consisted  of  four  items:  (a)  a  motivational 
cover  letter  and  instructions  as  in  Phase  I;  (b)  a  more  detailed  background  information  form 
than  used  in  Phase  I;  (c)  either  an  electronics,  mechanical,  or  general /admi  ni  strati  ve  benchmark 
rating  scule  booklet,  dep< naent  on  the  aptitude  area  of  the  AFS  in  question;  and  (d)  the  list  of 
60  tasks  to  be  rated  on  c  rating  form,  these  being  identical  to  those  originally  used  by  the 
contractor.  Only  40  tasks  were  used  in  AFSC  542X1  (as  were  used  by  the  contractor).  Random 
selection  of  100  SMEs  from  the  7-skill  level  (and  9-skill  level,  where  available)  of  each  of  the 
eight  AFSs  was  made.  In  addition  the  same  non-SME  respondents  as  in  Phase  I  were  again  se'ected 
to  complete  the  survey  for  AFSC  251X0,  Weather. 

Analyses  of  the  data  collected  in  Phase  II  had  three  major  goa’s:  (a)  to  determine  the 
internal  consistency  of  the  25-point  benchmark  task  difficulty  ratings  for  the  various  groups  cf 
raters;  (b)  to  determine  the  validity  of  the  rater  groups  using  the  same  val  idi  ty-measuri  rg 
procedures  as  in  Phase  I;  and  (c)  to  compare  measures  of  aptitude-specific  OLD  as  in  Phase  I  and 
non-aptitude  specific  occupational  learning  difficulty  (Ramadge,  1  987  )  generated  from  SME 
benchmark  data  with  those  generated  from  contractor  dat„. 


Results 


The  percentage  of  useful  responses  ranged  from  22  to  49  across  AFSs  studied.  Reliability  of 
ratings  was  assessed  using  the  ASCII  CODAP  Program  GPPREL  (originally  REXALL)  (Christal  i 
Weissmuller,  1  976).  This  program  produces  an  index  of  interrater  agreement.  Reliability  indices 
were  derived  for  each  rated  group  (Table  1).  Interrater  reliabilities  were  all  above  .96  or 
above  except  for  the  two  AFSs  with  the  lowest  numbers  of  useful  responses.  The  useful  response 
rate  and  lowest  interrater  reliabilities  in  the  AFSs  702X0  and  732X0  were  due  to  three  factors: 
(a)  There  had  been  significant  movement  of  personnel  in  and  out  of  these  career  fields  recently, 
increasing  the  number  of  1  ess-experienceu  raters;  (b)  within  these  career  fields,  there  are 
numerous  job  types  and  many  of  the  raters  surveyed  were  not  familiar  with  the  tasks  in  the  task 
list;  and  (c)  jMEs  were  given  the  option  not  to  rate  tasks  with  which  they  were  not  familiar. 
When  combined  with  the  first  two  reasons,  this  resulted  in  a  low  average  number  of  tasks  rated 
per  rater. 

Correlations  among  SME,  contractor,  and  supervisor  ratings  (Table  3)  were  calculated  in  the 
same  way  as  in  Phase  I.  Correlations  between  SME  ratings  and  contractor  ratings  ranged  between 
.79  and  .94,  except  for  542X1  (-.08)  and  732X0  (.65).  The  SME  ratings  all  correlated  extremely 
well  (between  .83  and  .94)  with  the  supervisor  ratings.  In  every  case,  SME  ratings  correlated 
better  than  contractor  ratings  with  the  supervisor  ratings.  With  this  in  mind,  the  validity  of 
the  contractor  ratings  for  AFSC  542X1  and,  to  a  lesser  extent  AFSC  732X0,  may  be  open  to  question. 
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IV.  CONCLUSION 

'he  mail  survey  procedure  tested  in  this  effort  proved  successful  in  that  SMEs  were  able  to 
reproduce  accurate,  although  in  general  slightly  inflated,  estimates  of  occupational  learning 
difficulty.  It  should  now  be  possible  to  produce  a  translation  formula  to  enable  the 
imp'ementation  of  this  method  for  collecting  accurate  benchmark  task  difficulty  ratings. 
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